VC dimension of ellipsoids 



Yohji Akama^'*, Kei Irie'' 

"Mathematical Institute, Tohoku University, Aoba-ku, Sendai, Miyagi, 980-8578, Japan 
+81-(0)22-795-n08(tel) +81-(0)22-795-6400(fax) 
* Department of Mathematics, Kyoto University, Kyoto, 606, Japan 



Abstract 

We will establish that the VC dimension of the class of d-dimensional ellip- 
soids is {(P + 3d)/2, and that maximum likelihood estimate with iV-component 
d-dimensional Gaussian mixture models induces a geometric class having vc 
dimension at least N{(P + 3(i)/2. 

Keywords: VC dimension; finite dimensional ellipsoid; Gaussian mixture 
model 



1. Introduction 

For sets X C R'' and F C X, we say that a set B C R'' cuts Y out of X if 
Y = X n B. A class C of subsets of R'' is said to shatter a set A" C R'^ if every 
y C A is cut out of A by some B E C. The vc dimension of C, denoted by 
VCdim(C), is defined to be the maximum n (or oo if no such maximum exists) 
for which some subset of R'' of cardinality n is shattered by C. 

The VC dimension of a class describes a complexity of the class, and are 
employed in empirical process theory fS] , statistical and computational learning 
theory d, Q and discrete geometry J|] . Although asymptotic estimates of vc 
dimensions are given for many classes, the exact values of vc dimensions are 
known for only a few classes (e.g. the class of Euclidean balls the class of 
halfspaces [6], and so on). 

In Section [21 we prove : 

Theorem 1. The class of d-dimensional ellipsoids has VC dimension (d^ + 
3d)/2. 

Here, by a d-dimensional ellipsoid, we mean an open set {x € R'' ; *(a; — fj,)A{x— 
fJ.) < 1} where e M"' and A e R''^'' is positive definite. 

In Section [3l we use a part of Theorem [T] (Lemma |3|) to study statistical 
models. In statistics and statistical learning theory, the class of d-dimensional 
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ellipsoids is induced from the class Qd of d-dimensional Gaussian distributions: 
A d-dimensional Gaussian distribution with mean /i G K'' and covariance matrix 
E € M.'^^'^ is, by definition, a probability density function 

(27r)-'*/2| dct Ej-i/^ cxp - ^)S-i(a; - fi)/2) , (a; e M'^) 

where a covariance matrix of size d is, by definition, a real, positive definite 
matrix. As in statistical learning theory [Sj, for a class V of probability density 
functions we consider the class V (V) of sets {x e M'' ; f{x) > s} such that / is 
any probability density function in V and s is any positive real number. Then 
(Qd) is the class of c?-dimensional ellipsoids. 
For a positive integer N, an N -component d-dimensional Gaussian mixture 
model 0] ( {N, d)-GMM ) is, by definition, any probability distribution belonging 
to the convex hull of some N d-dimcnsional Gaussian distributions. Suppose 
we are given a sample from a population (TV, c?)-GMM but the number N of the 
components is unknown. To select N from the sample is an example of Akaike's 
model selection problem d (see d for recent approach). The authors of 0] 
proposed to choose N by structural risk minimization principle d, where an 
important role is played by the VC dimension of the class V {{Gd)N) with {Qd)N 
being the class of (A^, d)-GMMs. Our result is that the VC dimension of V {{Qd)N) 
is greater than or equal to N{d^ + 3(i)/2. 



2. VC dimension of ellipsoids 

We will prove Theorem [TJ For a positive integer S, a vector a G \ {0}, 
and c G M, we write an afline function la.c{x) := ^ax + c (a; S R^) and an open 
halfspace Ha,c ■— {x G ; (.a,c{x) < 0}. We say a,setWC spans an affine 
subspace H C M^, if H is the smallest afline subspace that contains W. The 
cardinality of a set S is denoted by \S\. For a vector a = *(ai, . . . , as) G M^, 
let ||a||oo be max{ \ai\ ; 1 < i < B}. 

Lemma 2. For any a G M.^ \ {0} and any S C M.^ with \S\ — B, if S spans a 
hyperplane {x G ; £a.-i{x) — 0} , then S is shattered by a class {iJ^.-i ; b G 
\ {0}, ||6 - alloc < e} for any e > 0. 

Proof. By an affine transformation we can assume without loss of generality 
that all the components of the vector a are 1 and that S is the canonical basis 
{ei, . . . , es} of R^. Suppose ||6 — a||oo is less than e > 0. By 6 ^ 0, we have 
Hb -i 7^ . Then the vector belongs to the open halfspace if and 

only if the i-th component of b is less than 1. □ 

Lemma 3. The class of d-dimensional ellipsoids has VC dimension greater than 
or equal to (d^ + 3d)/2. 

Proof. Let B be the right-hand side. Let be a map S''^^ — !■ R^ which maps 

X = *(xi, . . . to \x\, . . .,x%xiX2, ■ ■ .,Xd-iXd,xi, . . .,Xd). Let *(^i, . . . 

be a coordinate of M^. Then the image (p (S**^^) spans a hyperplane + • • • + 
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— 1 = 0. So there is some set S C such that \S\ — B and <f{S) spans 
the hyperplane. Let a € be a vector with the first d components being 1 
and the other components being 0. By Lemma [H for any e > the family 

jffh,-! ; b eM^ \ {0}, ||6 - a||oo < e| shatters (p{S). By the definition of ip, 
the class of sets defined by quadratic inequalities 

bixl H h bdxl + bd+iXiX2 H h bsXd - 1 < ( ||6 - a|loo < £ ) 

shatters S. But, when e is sufficiently small, all of these sets are ellipsoids. □ 

We verify the converse inequality. 

Lemma 4. YC{{Ha,c ; a = *(ai,...,as) e R^^as > 0,c eR}) < B for any 
positive integer B. 

Below, the convex hull of a set A is denoted by conv(74). 

Proof. Let C be {Ha.c ; a = *(ai,...,aB) e R^,aB > 0,c e R}. Assume 
VCdim(C) > B. Then C shatters some set S CR^ such that \S\ ^ B + 1. 

If there are x — {u,XB),y = {u,yB) € S such that < ys, then for 
any a € R^ with the last component nonnegative and for any c G M we have 
£a,c{x) < ia,c{y), and thus X e Ha^c = {a; e R-^ ; £a,c{x) < 0} whenever 
y G Ha.c ■ This contradicts the assumption "C shatters S*." Therefore, for the 
canonical projection tt : R^ R^"^ '"^ have |7r(S')| — B + 1. 

By applying Radon's theorem [y| to the set Tr{S) C R^^^, there is a 
partition (Ti, T2) of S such that we can take y from conv(7r(Ti)) n conv(7r(r2))- 
Then wc sec that there are z,z' gM. such that {y,z) S conv(Ti) and {y,z') G 
conv(T2). Because C shatters S, there are some a G R^ and some c G R such 
that the last component gb of a is nonnegative and a halfspace Ha.c G C cuts 
Ti out of S. Thus, we have £a.c{x) < for all x G conv(Ti) while £a,c{x) > 
for all X G conv(T2) where T2 — S \ Ti. Therefore £a.c{y,z) < £a.c{y,z') and 
qb > 0, we have z' > z. On the other hand, some member Ha' .d G C cuts T2 
out of S. By a similar reasoning, we have z > z' , which is a contradiction. □ 

Corollary 5. // A C R^ \ {0} and YCdim{{Ha.c}aeA.cm) > B, then G 
conv(74). 

Proof. Let ^ conv(yl). Then for every finite subset A' of A, 6 ^ conv(A') 
and there is a hyperplane J through such that conv(A') is contained in one 
of the two open halfspaces determined by J. So there is a new rectangular 
coordinate system such that the origin point is the same as the older rectangular 
coordinate system, one of the new coordinate axes is normal to J, and any a G A' 
is represented as (oi, . . . , ob) with ob > 0. So YCdhn{{Ha.c}aeA' .cem) < by 
LemmaHl and thus VCdiin{{Ha.c}aeA.cm) < B. □ 



^Any sot of (d + 2) points in R'' can be partitioned into two disjoint sets whose convex 
hulls intersect. 
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The proof of Theorem[T]is as follows: By Leinnia[31 we have only to establish 
that the class of d-dimensional ellipsoids has vc dimension less than or equal 
to B := {(P + 3d)/2. Assume otherwise. For a = *(ai,...,as) G and 
X = *{xi, . . . ,Xd), define a quadratic form qa{x) and a quadratic polynomial 
Pa{x) by 



Pa{x) := qa{x) + aB-d+iXi H h anXd- 

Let A be the set of a 6 M.^ such that is positive definite. Obviously, A is 
convex and ^ A. Then, our assumption implies yCdim{{Ha.c}aeA,ces.) > B, 
since for any ellipsoid E, there exists a ^ A and c G R such that E — {x G 
M.'^ ; Paix) < — c}. Hence Corollary [5] shows that G conv(A) = A, which is a 
contradiction. □ 

3. A lower bound of VC dimension of GMMs 

For a positive integer N and a class V of probability density functions, let 
{V)n be the class of probability density functions pifi + ■ ■ ■ + pnIn such that 

/i, • • • , /w G T', K > and pi H + pn = I- For X C R'^ and t G M"*, put 

X + t := {x + 1 ; xG X}. The Euclidean norm of a vector x is denoted by ||x||. 
Let diam X = sup{||a; — a:;'|| ; x, x' G X}. 

Lemma 6. // a class V of probability density functions on ffi'^ satisfies 

1. for all f{x) G V and t eR'^ we have f{x + t) e V; and 

2. for any e > there exists a > such that f{x) < e whenever \\x\\ > a, 

then VCdim(r> {{V)n)) >Nx VCdim(X' (P)). 

Proof. Suppose X C K'' is shattered by V {V). Then for each Y <~ X there 
exist gy & V and ry G R such that 



When there is z E X \ Y such that — loggy(z) is equal to ry, we take a smaller 
ry > max {— log gy (a;) ; x G Y} with the condition ([l} kept. Then 

q mm{-ry^ - log gy^ (z) ; z e X \ Y,, Yj C X, 1 < j < N} (2) 

is well-defined and positive. Let (5 > be smaller than this and all of ry. + 
log gy^ (x) where x ^Yj C X and 1 < j < N. 

By the assumptions (1) and (2), we can prove that for any j G {!,..., N}, 
for any e > 0, for any ti,. . . ,tN G M'' with \\ti — tj\\ > diam X {i j), we 
have (i) U := UiLi(^ + ^i) has cardinality iV|A"|, and (ii) for any x G X, 



Fi C A, . . . , Fat C X, for py exp(ry )/ exp(ryj {1 < i < N), 



qa{x) aix\ H \- Odx} + ad+iXiX2 H h as-dXd-iXd, 



Y = XnDy, Dy = {x eR"^ ; gyix) > e-'''^}. 



(1) 
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Then the sum of the leftmost term and the rightmost term is, write f{x), 
a member of (2?('P))^, and satisfies < log/(x + tj) — log [pYj gVj {x)) < 
e / [pyj gYj {x)) , since log(l + u) < u for any m > 0. Because U is the dis- 
joint union oi X + ti over I < i < N, every subset V U has a unique 
sequence (i^i)i^i of subsets of X such that V = [J^^iiYi + ti). So, we can 
define ry '■= logX^t^i 6xp(rY-J, ^^"^ ^ogpYi = tyi — ry- Hence, there exist 
ti,...,tN G M'* such that \\ti — tj \\ > diam(X) {i ^ j) and for any x & X,Yi C 
X,...,Yn QXJ e{l,..., N}, we have 

0<{rv + log fix + tj)) - {ty, + log gY, [x)) < 6. (3) 

Define Cy := {a; £ M'' ; f{x) > exp(— ry)}, and suppose x & X and 
j e {1,...,A^}. Assume x e Yj. By ([1]), < ry,. + log5y^-(a;). By ©, < 
rYj -t-loggy^ (x) < rv + logf{x + tj). Therefore x + tj e Cy. On the other hand, 
assume a; e X \ Y, . By (|3l) and ry + log f{x + tj) < ry^. + logs'y,. (x) + S < 
tyj + loggYj (x) + q <0- Thus x + tj ^ Cy. To sum up, for any x £ X and any 
j e {1, . . . , N}, we have x G Y, x + tj G Cy. Hence [/ n Cy = F. Thus 

U is shattered by {V{V))j^. □ 

By Lemma [3] and Lemma [6l we have: 

Corollary 7. T/ie VC dimension of {N,d)-GMMs is greater than or equal to 
N{d'^ + 3d)/2. In other words, for the class [QdjN of {N,d)-GMMs, the class 
T^{{Qd)N) has the VC dimension greater than or equal to N{d^ + 3(i)/2. 

4. Conclusion 

We can easily obtain an asymptotically tight estimate of the class of d- 
dimensional ellipsoids through the combination of a naive linearization argu- 
ment @ and an approximation argument of "affine subspaces" ( hands Q , more 
precisely) by ellipsoids. However, we in Section [2] have provided the exact value 
of the VC dimension, by combining a linearization argument 6, 10] with an ar- 
gument about convex bodies. Our argument seems useful to establish the vc 
dimension of the class of bounded sets {x G M'' ; p{x) > 0} such that p is any 
real polynomial with bounded degree. 
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